Goto

Collaborating Authors

 Southern Norway


InvertiTune: High-Quality Data Synthesis for Cost-Effective Single-Shot Text-to-Knowledge Graph Generation

Faez, Faezeh, Tahaei, Marzieh S., Hu, Yaochen, Pourranjbar, Ali, Biparva, Mahdi, Coates, Mark, Zhang, Yingxue

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have revolutionized the ability to understand and generate text, enabling significant progress in automatic knowledge graph construction from text (Text2KG). Many Text2KG methods, however, rely on iterative LLM prompting, making them computationally expensive and prone to overlooking complex relations distributed throughout the text. To address these limitations, we propose InvertiTune, a framework that combines a controlled data generation pipeline with supervised fine-tuning (SFT). Within this framework, the data-generation pipeline systematically extracts subgraphs from large knowledge bases, applies noise filtering, and leverages LLMs to generate corresponding natural text descriptions, a task more aligned with LLM capabilities than direct KG generation from text. This pipeline enables generating datasets composed of longer texts paired with larger KGs that better reflect real-world scenarios compared to existing benchmarks, thus supporting effective SFT of lightweight models for single-shot KG construction. Experimental results on CE12k, a dataset generated using the introduced pipeline, show that InvertiTune outperforms larger non-fine-tuned LLMs as well as state-of-the-art Text2KG approaches, while also demonstrating stronger cross-dataset generalization on CrossEval-1200, a test set created from three established benchmark datasets and CE12k. These findings highlight the importance of realistic, high-quality training data for advancing efficient and high-performing Text2KG systems.


Defining, Understanding, and Detecting Online Toxicity: Challenges and Machine Learning Approaches

Shahi, Gautam Kishore, Majchrzak, Tim A.

arXiv.org Artificial Intelligence

Online toxic content has grown into a pervasive phenomenon, intensifying during times of crisis, elections, and social unrest. A significant amount of research has been focused on detecting or analyzing toxic content using machine-learning approaches. The proliferation of toxic content across digital platforms has spurred extensive research into automated detection mechanisms, primarily driven by advances in machine learning and natural language processing. Overall, the present study represents the synthesis of 140 publications on different types of toxic content on digital platforms. We present a comprehensive overview of the datasets used in previous studies focusing on definitions, data sources, challenges, and machine learning approaches employed in detecting online toxicity, such as hate speech, offensive language, and harmful discourse. The dataset encompasses content in 32 languages, covering topics such as elections, spontaneous events, and crises. We examine the possibility of using existing cross-platform data to improve the performance of classification models. We present the recommendations and guidelines for new research on online toxic consent and the use of content moderation for mitigation. Finally, we present some practical guidelines to mitigate toxic content from online platforms.


Agent-Based Exploration of Recommendation Systems in Misinformation Propagation

Jakobsen, Lise, Holden, Anna Johanne, Gürcan, Önder, Özgöbek, Özlem

arXiv.org Artificial Intelligence

This study uses agent-based modeling to examine the impact of various recommendation algorithms on the propagation of misinformation on online social networks. We simulate a synthetic environment consisting of heterogeneous agents, including regular users, bots, and influencers, interacting through a social network with recommendation systems. We evaluate four recommendation strategies: popularity-based, collaborative filtering, and content-based filtering, along with a random baseline. Our results show that popularity-driven algorithms significantly amplify misinformation, while item-based collaborative filtering and content-based approaches are more effective in limiting exposure to fake content. Item-based collaborative filtering was found to perform better than previously reported in related literature. These findings highlight the role of algorithm design in shaping online information exposure and show that agent-based modeling can be used to gain realistic insight into how misinformation spreads.


Large Language Models for Agent-Based Modelling: Current and possible uses across the modelling cycle

Vanhée, Loïs, Borit, Melania, Siebers, Peer-Olaf, Cremades, Roger, Frantz, Christopher, Gürcan, Önder, Kalvas, František, Kera, Denisa Reshef, Nallur, Vivek, Narasimhan, Kavin, Neumann, Martin

arXiv.org Artificial Intelligence

The emergence of Large Language Models (LLMs) with increasingly sophisticated natural language understanding and generative capabilities has sparked interest in the Agent-based Modelling (ABM) community. With their ability to summarize, generate, analyze, categorize, transcribe and translate text, answer questions, propose explanations, sustain dialogue, extract information from unstructured text, and perform logical reasoning and problem-solving tasks, LLMs have a good potential to contribute to the modelling process. After reviewing the current use of LLMs in ABM, this study reflects on the opportunities and challenges of the potential use of LLMs in ABM. It does so by following the modelling cycle, from problem formulation to documentation and communication of model results, and holding a critical stance.


A New HOPE: Domain-agnostic Automatic Evaluation of Text Chunking

Brådland, Henrik, Goodwin, Morten, Andersen, Per-Arne, Nossum, Alexander S., Gupta, Aditya

arXiv.org Artificial Intelligence

Document chunking fundamentally impacts Retrieval-Augmented Generation (RAG) by determining how source materials are segmented before indexing. Despite evidence that Large Language Models (LLMs) are sensitive to the layout and structure of retrieved data, there is currently no framework to analyze the impact of different chunking methods. In this paper, we introduce a novel methodology that defines essential characteristics of the chunking process at three levels: intrinsic passage properties, extrinsic passage properties, and passages-document coherence. We propose HOPE (Holistic Passage Evaluation), a domain-agnostic, automatic evaluation metric that quantifies and aggregates these characteristics. Our empirical evaluations across seven domains demonstrate that the HOPE metric correlates significantly (p > 0.13) with various RAG performance indicators, revealing contrasts between the importance of extrinsic and intrinsic properties of passages. Semantic independence between passages proves essential for system performance with a performance gain of up to 56.2% in factual correctness and 21.1% in answer correctness. On the contrary, traditional assumptions about maintaining concept unity within passages show minimal impact. These findings provide actionable insights for optimizing chunking strategies, thus improving RAG system design to produce more factually correct responses.


UAV Marketplace Simulation Tool for BVLOS Operations

Şerefoğlu, Kıvanç, Gürcan, Önder, Aydoğan, Reyhan

arXiv.org Artificial Intelligence

We present a simulation tool for evaluating team formation in autonomous multi-UAV (Unmanned Aerial Vehicle) missions that operate Beyond Visual Line of Sight (BVLOS). The tool models UAV collaboration and mission execution in dynamic and adversarial conditions, where Byzantine UAVs attempt to disrupt operations. Our tool allows researchers to integrate and compare various team formation strategies in a controlled environment with configurable mission parameters and adversarial behaviors. The log of each simulation run is stored in a structured way along with performance metrics so that statistical analysis could be done straightforwardly. The tool is versatile for testing and improving UAV coordination strategies in real-world applications.


An Investigation into the Causal Mechanism of Political Opinion Dynamics: A Model of Hierarchical Coarse-Graining with Community-Bounded Social Influence

Widler, Valeria, Kaminska, Barbara, Martins, Andre C. R., Puga-Gonzalez, Ivan

arXiv.org Artificial Intelligence

The increasing polarization in democratic societies is an emergent outcome of political opinion dynamics. Yet, the fundamental mechanisms behind the formation of political opinions, from individual beliefs to collective consensus, remain unknown. Understanding that a causal mechanism must account for both bottom-up and top-down influences, we conceptualize political opinion dynamics as hierarchical coarse-graining, where microscale opinions integrate into a macro-scale state variable. Using the CODA (Continuous Opinions Discrete Actions) model, we simulate Bayesian opinion updating, social identity-based information integration, and migration between social identity groups to represent higher-level connectivity. This results in coarse-graining across micro, meso, and macro levels. Our findings show that higher-level connectivity shapes information integration, yielding three regimes: independent (disconnected, local convergence), parallel (fast, global convergence), and iterative (slow, stepwise convergence). In the iterative regime, low connectivity fosters transient diversity, indicating an informed consensus. In all regimes, time-scale separation leads to downward causation, where agents converge on the aggregate majority choice, driving consensus. Critically, any degree of coherent higher-level information integration can overcome misalignment via global downward causation. The results highlight how emergent properties of the causal mechanism, such as downward causation, are essential for consensus and may inform more precise investigations into polarized political discourse.


Generating Spatial Synthetic Populations Using Wasserstein Generative Adversarial Network: A Case Study with EU-SILC Data for Helsinki and Thessaloniki

Falck, Vanja

arXiv.org Artificial Intelligence

Using agent-based social simulations can enhance our understanding of urban planning, public health, and economic forecasting. Realistic synthetic populations with numerous attributes strengthen these simulations. The Wasserstein Generative Adversarial Network, trained on census data like EU-SILC, can create robust synthetic populations. These methods, aided by external statistics or EU-SILC weights, generate spatial synthetic populations for agent-based models. The increased access to high-quality micro-data has sparked interest in synthetic populations, which preserve demographic profiles and analytical strength while ensuring privacy and preventing discrimination. This study uses national data from Finland and Greece for Helsinki and Thessaloniki to explore balanced spatial synthetic population generation. Results show challenges related to balancing data with or without aggregated statistics for the target population and the general under-representation of fringe profiles by deep generative methods. The latter can lead to discrimination in agent-based simulations.


Multimodal AI on Wound Images and Clinical Notes for Home Patient Referral

Fard, Reza Saadati, Agu, Emmanuel, Busaranuvong, Palawat, Kumar, Deepak, Gautam, Shefalika, Tulu, Bengisu, Strong, Diane

arXiv.org Artificial Intelligence

Chronic wounds affect 8.5 million Americans, particularly the elderly and patients with diabetes. These wounds can take up to nine months to heal, making regular care essential to ensure healing and prevent severe outcomes like limb amputations. Many patients receive care at home from visiting nurses with varying levels of wound expertise, leading to inconsistent care. Problematic, non-healing wounds should be referred to wound specialists, but referral decisions in non-clinical settings are often erroneous, delayed, or unnecessary. This paper introduces the Deep Multimodal Wound Assessment Tool (DM-WAT), a machine learning framework designed to assist visiting nurses in deciding whether to refer chronic wound patients. DM-WAT analyzes smartphone-captured wound images and clinical notes from Electronic Health Records (EHRs). It uses DeiT-Base-Distilled, a Vision Transformer (ViT), to extract visual features from images and DeBERTa-base to extract text features from clinical notes. DM-WAT combines visual and text features using an intermediate fusion approach. To address challenges posed by a small and imbalanced dataset, it integrates image and text augmentation with transfer learning to achieve high performance. In evaluations, DM-WAT achieved 77% with std 3% accuracy and a 70% with std 2% F1 score, outperforming prior approaches. Score-CAM and Captum interpretation algorithms provide insights into specific parts of image and text inputs that influence recommendations, enhancing interpretability and trust.


Towards resilient cities: A hybrid simulation framework for risk mitigation through data driven decision making

Carraminana, David, Bernardos, Ana M., Besada, Juan A., Casar, Jose R.

arXiv.org Artificial Intelligence

Providing a comprehensive view of the city operation and offering useful metrics for decision making is a well known challenge for urban risk analysis systems. Existing systems are, in many cases, generalizations of previous domain specific tools and or methodologies that may not cover all urban interdependencies and makes it difficult to have homogeneous indicators. In order to overcome this limitation while seeking for effective support to decision makers, this article introduces a novel hybrid simulation framework for risk mitigation. The framework is built on a proposed city concept that considers urban space as a Complex Adaptive System composed by interconnected Critical Infrastructures. In this concept, a Social System, which models daily patterns and social interactions of the citizens in the Urban Landscape, drives the CIs demand to configure the full city picture. The frameworks hybrid design integrates agent based and network based modeling by breaking down city agents into system dependent subagents, to enable both inter and intra system interaction simulation, respectively. A layered structure of indicators at different aggregation levels is also developed, to ensure that decisions are not only data driven but also explainable. Therefore, the proposed simulation framework can serve as a DSS tool that allows the quantitative analysis of the impact of threats at different levels. First, system level metrics can be used to get a broad view on the city resilience. Then, agent level metrics back those figures and provide better explainability. On implementation, the proposed framework enables component reusability (for eased coding), simulation federation (enabling the integration of existing system oriented simulators), discrete simulation in accelerated time (for rapid scenario simulation) and decision oriented visualization (for informed outputs).